Fast decisions reflect biases, slow decisions do not

Decisions are often made by heterogeneous groups of individuals, each with distinct initial biases and access to information of different quality. We show that in large groups of independent agents who accumulate evidence the first to decide are those with the strongest initial biases. Their decisions align with their initial bias, regardless of the underlying truth. In contrast, agents who decide last make decisions as if they were initially unbiased, and hence make better choices. We obtain asymptotic expressions in the large population limit that quantify how agents’ initial inclinations shape early decisions. Our analysis shows how bias, information quality, and decision order interact in non-trivial ways to determine the reliability of decisions in a group.

Evidence accumulation models are used widely to describe how different organisms integrate information to make choices [3].Experimental evidence shows that these models capture the dynamics of the decision making process of humans and other animals, including the tradeoff between speed and accuracy [5,25,30,34,37].Such models can also be used to understand how decisions are made in social groups, both when individuals observe each other's choices [10,19,31,36] and when they act independently [33].
The accumulation of evidence is often modeled using biased Brownian motion with the quality of evidence determining the magnitude of drift and diffusion.An agent is assumed to commit to a decision when the process crosses a threshold.Most previous evidence accumulation models describe a single agent.However, questions remain about how the order of choices in a group is related to their accuracy [40].In a group of initially unbiased individuals accumulating evidence of different quality, the fastest and most accurate decisions are made by those accessing the highest quality information [31].Here we ask how the initial biases of individuals in a group impact the order and accuracy of their choices.When is a decision driven mainly by an agent's initial bias as opposed to accumulated evidence?
We show that in large groups of agents starting with different initial biases, early decisions tend to be made by agents with the most extreme predispositions.The choices of these agents agree with their initial bias, regardless of the quality of the evidence they have access to.
On the other hand, decisions of late deciders do not depend on their initial bias.Thus, in large groups early decisions reflect only initial inclinations, regardless of which choice is right.Late decisions reflect only accumulated evidence and are more likely to be correct.These effects hold generically, but not in the special case of initially unbiased agents [31].
Model description.We first assume that each individual in a population of N agents has to decide between two choices (hypotheses), H + and H − .They do so by accumulating evidence and computing the conditional probabilities, P (H ± |evidence), that one of the two hypotheses is correct.When observations are independent and each provides weak evidence, the log likelihood ratio, or belief, of agent i in the group, X i = log (P (H + |evidence i )/P (H − |evidence i )), evolves approximately as a biased Brownian motion [3,7] (See Fig. 1A), where the drift, µ i , and diffusion coefficient, D i , capture the strength and noisiness of the evidence, respectively [26].For all agents the correct choice (H ∈ H ± ) is given by the sign of the drift (sign[µ i ] = ±1).Eq. ( 1) is widely used and accurately captures the dynamics of decisions in humans and animals, including variability in response time and the impact of evidence quality and biases on choice [8,23,30].
Agents start with an initial bias, X i (0), reflecting information or assumptions they have about the prior probability of either hypothesis [23].We denote by y the initial data for a generic agent.Each agent then accumulates evidence, and its beliefs evolve according to Eq. (1).Agent i makes a decision when its belief reaches one of two thresholds, −θ < 0 < θ, at decision time τ i := inf{t > 0 : X i (t) / ∈ (−θ, θ)}.This decision, denoted by is determined by the sign of the threshold reached, sign[X i (τ i )].If decision criteria differ between agents an appropriate rescaling of X i (0), µ i , and D i allows us to assume that all agents use the same thresholds [3].
Agents with the most extreme initial biases decide first.We show that in large groups agents whose initial biases are closest to one of the thresholds make the earliest decisions.
We first assume observers are identical except for their initial biases, so that µ i = µ and (1).We denote by T i the i th decision time so that where T i = τ n(i) and n(i) is the index of the i th agent to decide.Hence, the index of the first decider is n(1).
For simplicity, we assume that each agent starts with one of finitely many initial beliefs, {x 0 , x 1 , . . ., x I−1 }, sampled with probability q i = P (X j (0) = x i ) for i = 0, ..., I − 1.The distance of the initial belief x i to the closest threshold is be the index of the unique most extreme initial belief held by an agent, so L 0 < L i for i ̸ = 0.
For a fixed number of initial beliefs, I, the first agent to decide in a large group is the one with the largest initial bias (Fig. 1A), in the sense that More precisely, in the Supplemental Material (SM) we show that as N → ∞ for each i ̸ = 0, where and µ i = ±µ if x i ≷ 0. The same statement holds if n(1) is replaced by n(j) in Eq. ( 3), but with a change in the prefactor, η i (See SM).Thus, the probability that the first decision is not made by the agent with the most extreme initial belief decreases as a negative power of the population size N (Fig. 1B).The approximation given by Eq. ( 3) is in excellent agreement with the true probabilities when N ⪆ 10 3 (See Fig 1B inset).Moreover, the probability that the agents with the most extreme initial beliefs make the first decision is close to unity already for N ≈ 100 when initial beliefs are well separated and drift is not too strong.
The choice of the fastest decider agrees with their initial bias: e.g., if θ is the threshold closest to the most extreme initial belief, x 0 , then P (X n(1) (T 1 ) = θ) → 1 as N → ∞ (See Fig. 2A,B).Similar results hold when initial beliefs are drawn from a continuous distribution (See SM and next section).Thus, although all agents behave rationally, early decisions of biased agents tend to be less accurate [11,33].
In contrast, the probability that a single agent -or one chosen randomly without regard to decision order -decides incorrectly can be made arbitrarily small by increasing the drift or threshold [3].In large populations with biased agents, drift and diffusion impact the probability of the first decision only through the prefactor in Eq. (3), η i , and thus decrease in importance as population size diverges.If even a small proportion of a large population holds an initial bias, early decisions are determined by the most extreme bias (Fig. 2B) regardless of the drift (Fig. 2C).On the other hand, if all deciders are initially unbiased (X i (0) = 0 for all i), the probability the first decider makes a correct choice is (1 + exp(−µθ/D)) −1 [3].
Heterogeneous population and continuous distribution of initial biases.While we can obtain the most precise asymptotic results in the homogeneous case, our conclusions extend to populations of agents with heterogeneous distributions of initial biases, drifts, diffusivities, and thresholds.We again assume that each agent again starts with one of finitely many initial beliefs, X i (0) ∈ {x 0 , x 1 , . . ., x I−1 } with drift and diffusivity sampled from a finite set of fixed size.For each agent we define the diffusive timescale, By assumption, the timescales S i follow a discrete distribution P(S = s i ) > 0 with support on a finite set 0 < s 0 ≤ s 1 ≤ s 2 ≤ s 3 • • • ≤ s J , and S n(j) refers to the timescale of the j th agent to decide (See Fig. 2D).We denote by s the diffusive timescale of a generic agent.
In large populations, early deciders are those with the shortest diffusive timescales.In particular, we show in the SM that for every ε > 0 and fixed j ≥ 1, where we use the notation f ≪ g to mean lim N →∞ f /g = 0. We can thus conclude that These results agree with our earlier conclusion: If all agents share the same diffusivity, then the fastest deciders are the agents who start closest to their decision thresholds.This is true regardless of the quality of the evidence they receive.Diffusivity can reduce the effective distance to the threshold according to Eq. (4).Thus, the fastest deciders are either those with the most extreme initial biases or those with the noisiest integration process, regardless of the drift, µ i .Indeed, how we model drift does not impact these conclusions, and they hold even if we model the evolution of beliefs as an Ornstein-Uhlenbeck process, as is frequently done in the psychophysics literature [4,38].Late deciders make decisions as if initially unbiased.We expect in large populations the inaccuracy of early deciders to be balanced by higher accuracy of late deciders [33].Thus, we next determine the probability that the last agent to decide makes a correct decision.In the SM we show that this probability has an intuitive form, Here p θ (x) is the probability that a single agent with initial bias X(0) = x makes a correct decision, and q(x) is the quasi-steady state distribution [27] of beliefs evolving according to Eq. (1).Thus the decision of the last decider is made as if they forget their actual initial bias and instead sample an initial belief from the quasi-stationary distribution, q(x).
Eq. ( 6) is general and can be extended to arbitrary domains.When applied to the driftdiffusion process with decision boundaries at ±θ we show in the SM that P(X n(N , which is the probability that a single, initially unbiased decider makes a correct decision (See Fig. 3A) [3].Thus, the last decider forgets their initial bias and makes decisions based only on the accumulated evidence.The probability that an agent with a large initial bias makes a late decision is small.But should this happen, the Beliefs about three options evolve on an equilateral triangle.Here, θ is the closest distance from the center of the triangle (burgundy ring) to the boundary.The initial bias is the distance from the triangle center to the initial belief, X i (0).As N increases, the probability that the most biased agent chooses first grows.Curves are computed by averaging 10 6 stochastic simulations.Inset: Sample trajectories from a trial with biases sampled with equal probability from {θ/2, θ/4, θ/8}.The first agent to decide (red) has the largest initial bias.The belief of the last decider (blue) explores the space before reaching a threshold.initial bias will have little impact on their decision (See Fig. 3B).
Extension to multiple alternatives.We can extend these results to decisions between k alternatives.Eq. (1) again describes the evolution of beliefs, but now and W i is a vector of independent Wiener processes [28].Each belief evolves on a domain, Ω ⊂ R k−1 , with k boundaries [21], each associated with one of the alternatives.Agent i chooses alternative j if its belief, X i (t), crosses the associated boundary first.The boundaries that lead to the best decisions are difficult to find analytically [35], but their exact shape is immaterial for our result.
In the SM we show that Eq. ( 3) holds for general domains in arbitrary dimensions (See Fig. 4).We therefore reach our earlier conclusions: In large homogeneous populations, the agents holding the most extreme initial beliefs make the first decisions, and their choices are consistent with their initial biases.Our conclusions about the late decisions also carry over to agents facing multiple choices: The natural extension of Eq. ( 6) holds with q(x) the quasi-stationary distribution on Ω.The last decider makes a choice as if it sampled its initial belief from this quasi-stationary distribution.
Discussion Our decisions are often influenced by information we obtained previously and predilections we develop.In drift-diffusion models, prior evidence and initial inclinations are often represented by a shift in the initial state.We have shown that initial biases determine early decisions and have a diminishing impact on later decisions.
An agent unaware of the order of their decision would believe this decision was made according to the evidence the agent accumulated and that the accuracy of their choice is determined only by the decision threshold [3].Though early decisions are not always necessarily less accurate [10], our work identifies a clear case in which hasty choices tend to be the most unreliable.Our findings also suggest a means of weighting choices of biased agents according to decision order in a large group when formulating collective decisions [20].However, in social groups the exchange of social information between agents [1,22] or correlations in the evidence [33] will affect these results.
Ramping activity of individual neurons during decision making has been observed across the brain [12,32] (although see [14]).Such dynamics may reflect the underlying evidence accumulation process preceding a decision and is often modeled by a drift-diffusion process.
Decisions are thought to be triggered by the elevated activity of sufficiently many choicerelated neurons [39].Our results suggest that in large neural populations decisions reflect the most extreme initial neural states, rather than the accumulated evidence, if the activity is uncorrelated.Since neural activity is often correlated [6], the effect of such biases could be tempered.
While we have interpreted our results in the context of social decision theory, they apply more generally to independently evolving drift-diffusion processes on bounded domains [17]: In large populations early threshold crossings reflect only the initial states, while late crossings are independent of initial states and reflect the quasi-stationary distribution.Hence, early crossings reflect initial biases providing fast reactions needed for deadlined biophysical processes [9].If time allows, quorum sensing processes that weight passages by order could be used [13].Thus, our theory shows how initial biases can be used to implement population level tradeoffs between speed and accuracy.

MATHEMATICAL PRELIMINARIES
Suppose {(τ n , Z n )} n≥1 is an independent and identically distributed (iid) sequence of realizations of the pair of (possibly correlated) random variables (τ, Z).We have in mind that τ is the decision time (or first passage time (FPT)) of some decider whose stochastic evolution of beliefs is denoted by {X(t)} t≥0 and Z is a vector containing information about this decider, such as their random initial position, drift, diffusivity, and decision made.
Define the cumulative distribution function (CDF) of τ , Further, for any event E that is in the σ-algebra generated by Z, define In words, E is any event for which we can know whether or not it occurred by knowing Z.For example, we are interested in events E like E = {X(0) = θ/2}, E = {X(0) ≤ 0}, For a given N ≥ 1, let n(j) ∈ {1, . . ., N } denote the (random) index of the jth fastest decider out of the first N deciders to make a decision.That is, suppose we order the first N FPTs (or first decision times), where T j,N denotes the jth fastest FPT, Then n(j) is such that In the examples of interest, the FPTs, τ, have continuous probability distributions (i.e.F (t) is a continuous function) so that the event τ n * = τ n ′ < ∞ for n * ̸ = n ′ has probability zero so there is no ambiguity in Eq. ( 8).
Since we have the sequence {(τ n , Z n )} n≥1 , we denote E n the event E as it pertains to the nth element in the sequence {(τ n , Z n )} n≥1 .For example, if E = {X(0) = θ/2}, then Similarly, E n(j) is the event E as it pertains to Z n(j) .
Throughout the Supplemental Material, we use the notation f (t) dg(t) to denote the Riemann-Stieltjes integral of a function f with respect to a function g.
Proposition 1.For any j ∈ {1, 2, . . ., N } (denoting an agent by the order j of their decision), we have that In the case j = 1 (i.e. the fastest decider), Proposition 1 implies Since 1 − F is a decreasing function, Eq. ( 10) implies that the short-time behavior of F and F E determine the large N behavior of P(E n(1) ).More generally, Proposition 1 implies that the short-time behavior of F and F E determine the large N behavior of P(E n(j) ) for 1 ≤ j ≪ N .
In the case j = N (i.e. the slowest decider), Proposition 1 implies Since F is an increasing function, Eq. ( 11) implies that the large-time behavior of F and F E determine the large N behavior of P(E n(N ) ).More generally, Proposition 1 implies that the large-time behavior of F and F E determine the large N behavior of P(E n(N −j) ) for 1 ≪ N −j.

SOME INTEGRAL ASYMPTOTICS
The following proposition is useful for estimating the large N behavior of some integrals of the form in Eq. ( 9) and was proved in [17] (See Proposition 2 in [17]).Throughout the Supplemental Material, "f ∼ g" denotes f /g → 1 (e.g., as N → ∞ or as t → 0).
The following result estimates integrals of the form in Eq. ( 9) for 1 ≤ j ≪ N assuming that F (t) and F + (t) have short-time t behavior that is characteristic of diffusion.
Theorem 3. Assume F (t) and F + (t) are bounded, nondecreasing, continuous from the right, and satisfy where C + > C 0 > 0, A > 0, B > 0, and p, q ∈ R. Then for any fixed integer j ≥ 1, we have where and Γ(x) := ∞ 0 z x−1 e −z dz denotes the gamma function.
Notice that the asymptotic behavior found in Theorem 3 as N → ∞ is independent of j ≥ 1, except for the constant prefactor η(j).Further, this prefactor is an increasing function of j and satisfies The asymptotic behavior in Eq. ( 12)-( 13) is typical for diffusion, but computing the prefactors A and B and the powers p and q can be challenging [15].Indeed, these constants depend on the details of the system (e.g., drift, space dimension, geometry of the domain, etc.).However, the constants in the exponents C 0 and C + are more universal and can be obtained in a very general mathematical setting [16].The following result yields estimates on the fastest deciders when we only know these constants, which is equivalent to knowing the short-time behavior of F + (t) and F (t) on a logarithmic scale.
Theorem 4. Assume F (t) and F + (t) are bounded, nondecreasing, continuous from the right, and satisfy where C + > C 0 > 0. Then for every ε > 0, where If, in addition, we assume that then for every ε > 0, The following result estimates integrals of the form in Eq. ( 9) for 1 ≪ N − j ≤ N assuming that F (t) and f i (t) = F ′ i (t) have large-time t behavior that is characteristic of diffusion in a bounded domain.
Theorem 5. Assume F (t) ∈ [0, 1) is continuous and nondecreasing and f i (t) is continuous and bounded and where λ > 0, c > 0, c i > 0. Then for any fixed j ≥ 0, we have that

PROOF OF EQ. (3) IN MAIN TEXT
We now apply Theorem 3 to obtain Eq. (3) in the main text.Suppose the belief of each agent evolves independently according to the following stochastic differential equation where µ ∈ R is a constant drift, D > 0 is a constant diffusivity, and W = {W (t)} t≥0 is a standard Brownian motion.Define the FPT, for some threshold θ > 0. Assume that the initial distribution P(X(0) = x i ) of each agent is a sum of Dirac masses at a finite set of points {x 0 , x 1 , . . ., x I−1 }, , where and where L i is the distance to the closest threshold from x i , Further, we assume 0 ∈ {0, 1, . . ., I − 1} is the index of the unique starting location closest to a threshold Thus, when N is large the first decider out of many deciders is always the one with the most extreme initial bias.Using the integral representation in Proposition 1 and applying Theorem 3 yields where and where

FIRST DECISION AGREES WITH INITIAL BIAS
The analysis above shows that the first agent to decide in a large group has the most extreme initial bias.We now show the intuitive result that this first decider's decision agrees with their initial bias.Without loss of generality, assume that the most extreme initial bias is negative, x 0 < 0. Letting F + (t) = P(τ ≤ t ∩ X(τ ) = +θ), we have where i + ∈ {1, . . ., I} is the index of the starting location closest to +θ.Using the integral representation in Proposition 1 and applying Theorem 3 yields

CONTINUOUS INITIAL BELIEF DISTRIBUTION
In Section , we showed that the first of many deciders have the most extreme initial beliefs in the case that the population has a discrete initial belief distribution.We now generalize this calculation to the case that the deciders have a continuous initial belief distribution.
In particular, suppose that the decider's initial belief (position) has a smooth probability density ν(x) with support (a, b) with −θ < a < b < θ.Suppose that where the coefficients are positive, ν a > 0, ν b > 0, and the powers ensure that ν is integrable, In light of ( 19), suppose that where It follows that A(x)ν(x)e −C(x)/t dx as t → 0 + .
We thus need to estimate the small time t asymptotics of the integral which is an exercise in Laplace's method [2].If b > 0, then for any ε ∈ (0, b), we have Similarly, if a < 0, then for any ε ∈ (0, |a|), we have Putting this together, we have that if b > |a|, then With these estimates, we can apply Theorem 3 to obtain estimates that the fastest decider(s) have extreme initial beliefs.In particular, suppose we want to estimate which is the probability that the fastest decider does not have extreme initial beliefs.If we define the event then using the notation of Section , we have that x)e −C(x)/t dx as t → 0 + , which can be estimated as above using Laplace's method [2].In particular, if b > |a|, then

HETEROGENEOUS POPULATION WITH MULTIPLE ALTERNATIVES
We next consider the generalized case where the beliefs of the agents in the population evolve as processes with (possibly space-dependent) drift, diffusion coefficient, initial position, and even domain (in their own arbitrary space dimension d ≥ 1).Suppose the belief of the ith decider evolves according to the following d-dimensional SDE, where µ i : R d → R d is a possibly space-dependent drift, D i > 0 is the diffusion coefficient, and W (t) ∈ R d is a standard Brownian motion in d-dimensional space.
Let L > 0 denote an agent's (random) shortest distance they must travel to hit the closest target and let D > 0 denote the agent's diffusion coefficient.Define the random timescale Suppose that S has a discrete distribution on a finite set where Since we have N ≥ 1 iid agents indexed from n = 1 to n = N , we let S n denote the value of S for the nth agent and S n(j) the value of S for the jth fastest to decide.
We have that [16] Hence, Proposition 1 and Theorem 4 imply that for any fixed j ≥ 1 and i ∈ {1, . . ., I} and any ε > 0, where we use the notation f ≪ g to mean lim f /g = 0.That is, in more traditional notation, In the special case that the agents all move in one space dimension and the drifts are spatially constant (but may differ between agents), we can get the constant and logarithmic prefactors on the decay of P(S n(j) = s i ) as N → ∞.
The result in Eq. ( 22) says that in a large population if all the agents have the same diffusion coefficient, then the fastest deciders started closest to their decision thresholds (targets).If we allow the diffusion coefficients to vary between agents, then (22) implies that the fastest deciders started close to their decision thresholds and/or they had big diffusion coefficients.
using the boundary conditions in (24) and the following weighted inner product, where (f, g) = U f (x)g(x) dx denotes the standard L 2 -inner product (i.e. with no weight function).Expanding the solution to (24) yields, where denote the (necessarily positive) eigenvalues of −L.The corresponding with eigenfunctions {u n (x)} n≥1 satisfy the following time-independent equation, and identical boundary conditions as S. Further, the eigenfunctions are orthogonal and are taken to be orthonormal, which means that where δ nm denotes the Kronecker delta function (i.e.δ nn = 1 and δ mn = 0 if n ̸ = m).
If the initial distribution of an agent has probability measure µ 0 , then the FPT τ has survival probability given by where the condition X(0) = d µ 0 in the conditional probability merely denotes that X(0) has initial distribution given by µ 0 .Hence, we obtain the following representation for the survival probability, where the coefficients are given by the following integrals, We have that the FPT τ to one of the targets has CDF and therefore Applying Proposition 1 and Theorem 5 yields Now, the solution to the forward Fokker-Planck equation is given by Hence, u 1 (x)ρ(x)/(u 1 , ρ) is the quasi-stationary distribution (QSD), q(x), defined by Summarizing, we have shown that The case of drift-diffusion processes in one dimension.For the one-dimensional example in which all the beliefs of all the agents evolve according to (18), we can compute the QSD, and find that .
Further, it is straightforward to show that the probability that a decider reaches +θ before Therefore, applying (33) and explicitly computing the integral yields Hence, the slowest deciders out of N ≫ 1 deciders make a decision as if they were initially unbiased (i.e. as if X(0) = 0).

PROOFS
Proof of Proposition 1.Since {(τ n , Z n )} n≥1 are identically distributed, we have that where the coefficient comes from noting that the number of terms in the sum is obtained by choosing the j fastest FPTs out of N and then choosing which of those j will be the jth fastest.Define +∞ if A j does not occur, so that if j < N , P(max{τ 1 , . . ., τ j−1 } < τ j < min{τ j+1 , . . ., τ N } ∩ A j ) = P(max{τ 1 , . . ., τ j−1 } < τ Using Eq. ( 38) and integrating by parts yields The first term in the righthand side of Eq. ( 40) vanishes exponentially fast as N → ∞.To handle the second term in the righthand side of Eq. ( 40), note that Eq. ( 39) implies that Since the third term in the righthand side of Eq. ( 40) is nonpositive, applying Proposition 2 to Eq. ( 41) and using Eq. ( 40) and the fact that I δ,∞ vanishes exponentially fast as N → ∞ completes the proof of Eq. ( 16).

NUMERICAL SOLUTIONS
Numerical solutions were computed via trapezoidal quadrature on Eq. ( 9) in Proposition 1.In each set of dynamics, we rescaled the drift-diffusion process on [−θ, θ] to the interval [0, ℓ].The probability density function for hitting the left boundary in this system is [24] f where The expressions in Eq. ( 46) are equivalent but have distinct utility: the top expansion converges quickly for large s while the bottom expansion converges quickly for small s.
Hence, we utilize both expressions to more accurately compute probabilities associated with slow and fast deciders, respectively.
Integrating Eq. (45) yields x 0 ℓ with long-and short-time expansions of Φ(s, w) given by Where numerical solutions are illustrated, we use the short-time expressions of ϕ and Φ for 10 −10 ≤ t ≤ 1 and the complementary long-time expressions for 1 < t ≤ 100, discretizing each time interval into 10 3 log-spaced points.We consider 10 3 terms in each series expansion.
Moreover, we take ℓ = 1 and unless otherwise stated D = 1.Finally, where more than one but finitely many initial beliefs are considered, we scale the probability functions according to the corresponding initial distribution as outlined in Section .AGENT-BASED STOCHASTIC SIMULATIONS a. One-dimensional drift diffusion equation.To test the analytical solutions, we solved Eq. (1) in the main text using the Euler-Maruyama method, which describes the evidence accumulation process preceding binary decisions.In this approximation scheme, the true solution to the stochastic differential equation is approximated by a Markov chain Y constructed by setting Y 0 = X(0) and updating Y according to the iterative scheme where Y n ≡ Y (n∆t) is the value of the Markov chain after the nth update, and the random variables ∆W are independent and identically distributed Gaussian random variables with mean 0 and variance ∆t.The equations were integrated until the value of Y n exceeded ±θ.
The temporal discretization, ∆t, is user-defined.As N grows, the time to first decision decays slowly.Thus, for large N , ∆t must be taken to be sufficiently small for accurate representation of decision dynamics.For simulations here, we chose ∆t = 10 −3 for 1 ≤ N ≤ 1000.For N > 1000, we chose ∆t = N −1 .
b. Two-dimensional drift diffusion equation.Decisions between three choices require a drift-diffusion model evolving on a planar domain [21].Updating the discrete-time approximation of Eq. ( 21) for each observer (dropping the i subscript) using Euler-Maruyama provides the following iterative scheme where Y j n = Y j (n∆t) is the value of the belief after the nth update, the random variables ∆W j are Gaussian random variables with mean 0 and variance σ For the 2D case in an equilateral triangle, the threshold θ is taken to be equal to the length of the apothem-defined as a line from the center of a regular polygon at right angles to any of its sides.Hence, an unbiased agent begins at the centre of the equilateral triangle.
We prescribe initial data for biased agents to be anywhere along an apothem except the centre of the triangle.

FIG. 1 .
FIG. 1.Initial bias determines the choice of early deciders.(A) Evolution of beliefs of N = 10 4 agents who each have even odds of initially being unbiased or biased (P (X j (0) = x i ) = 0.5, x i = 0, −0.5).The first agent (red) decides according to their initial bias, and makes the wrong decision at T 1 ≈ 0.01.The last agent (blue) decides correctly at T 10000 ≈ 10. (B) Probability that the agent with the largest initial bias decides first as a function of population size, N .Solid curves were determined by numerical quadrature (Eq.(S4)) with initial biases assigned with uniform probability from values listed in the legend; black crosses denote results of a stochastic simulation averaged over 10 6 trials.Inset: Log-log plot of the same results with dashed curves showing the asymptotic results in Eq. (3).Throughout, agents use identical thresholds ±θ = ±1, drift µ = 1, and diffusivity D = 1.

FIG. 2 .
FIG. 2. First decider accuracy is determined by its initial bias.(A) The accuracy of the first decider as a function of population size, N, for different initial biases, y, obtained by quadrature.Curves are ordered by the proximity of the initial bias y of the first decider to the correct threshold +θ.The drift, and hence the correct decision, are positive.(B) Under the same assumptions a small deviation from an unbiased initial belief strongly affects the probability of a correct first decision when N is large.(C) Drift weakly affects the first decision in populations with biased agents (y = θ/4 here) when N is large.See SM for decision polarity formulas.(D) In large populations in which all agents have the same initial bias, y = θ/2, but different diffusivities, early deciders (here first and third) have the shortest diffusive timescale.X's represent averages of stochastic simulations over 10 6 trials.

FIG. 3 .
FIG.3.Late deciders make choices as if they held no initial bias.(A) For large N , decision accuracy monotonically increases with decision order.The accuracy of late deciders approaches the accuracy of a single, initially unbiased agent.Here, all agents have initial bias θ/3, and on each trial, P(H = H + ) = 0.5.(B) In large groups even a large initial bias has no impact on the decision of later agents.Here, initial biases are sampled with uniform probability from (−θ, θ).

2 FIG. 4 .
FIG.4.Bias impacts multi-alternative and two-alternative decisions similarly in large groups.
and similarly if |a| > b or |a| = b.With this short-time behavior of F E (t), we can then plug this into Theorem 3 to show that the first deciders have the most extreme initial beliefs.
b = (µℓ/2D)2 .By symmetry one can determine the corresponding probability density and cumulative distribution functions for hitting the right boundary.Altogether, we acquire long-and short-time expressions for the cumulative distribution function of an agent making a decision, Specific details of figures with numerical solutions are as follows: In Fig1Bwe illustrate in color Eq.(10) where F E = F as defined above with X n(1) (0) = y.The black curve, which contains the remaining mass of the total probability, is computed as the sum of the colored curves subtracted from one.In Fig2A-Cwe illustrate the probability that the first decider chooses the decision at X(T 1 ) = θ conditioned on having a particular initial bias.Hence, by definition of conditional probability, the numerical solutions are produced from quadrature on ratios of Eq.(10) with F E = F 1 in the numerator and F E = F in the denominator with X n(1) = y.The inset of Fig 2C is one minus the outset.In Fig 2D we illustrate Eq. (9) where F E = F and S n(j) = s.In Fig 3B we illustrate the probability that the last decider chooses the decision at X(T N ) = θ conditioned on having a particular initial bias.Similar to Fig 2B, the numerical solutions are produced from quadrature on ratios of Eq. (11) with F E = F 1 in the numerator and F E = F in the denominator with X n(N ) (0) = y.